arXiv: 1506.04391 v3 [cs.CR] 21 Dec 2015 


l 


CamFlow: Managed data-sharing 
for cloud services 

Thomas F. J.-M. Pasquier, Member, IEEE, Jatinder Singh, Member, IEEE, David Eyers, Member, IEEE 

and Jean Bacon Fellow, IEEE, 


Abstract —A model of cloud services is emerging whereby a few trusted providers manage the underlying hardware and communica¬ 
tions whereas many companies build on this infrastructure to offer higher level, cloud-hosted PaaS services and/or SaaS applications. 
From the start, strong isolation between cloud tenants was seen to be of paramount importance, provided first by virtual machines (VM) 
and later by containers, which share the operating system (OS) kernel. Increasingly it is the case that applications also require facilities 
to effect isolation and protection of data managed by those applications. They also require flexible data sharing with other applications, 
often across the traditional cloud-isolation boundaries; for example, when government, consisting of different departments, provides 
services to its citizens through a common platform. 

These concerns relate to the management of data. Traditional access control is application and principal/role specific, applied at policy 
enforcement points, after which there is no subsequent control over where data flows; a crucial issue once data has left its owner’s 
control by cloud-hosted applications and within cloud-services. Information Flow Control (IFC), in addition, offers system-wide, end-to- 
end, flow control based on the properties of the data. We discuss the potential of cloud-deployed IFC for enforcing owners’ data flow 
policy with regard to protection and sharing, as well as safeguarding against malicious or buggy software. In addition, the audit log 
associated with IFC provides transparency and offers system-wide visibility over data flows. This helps those responsible to meet their 
data management obligations, providing evidence of compliance, and aids in the identification of policy errors and misconfigurations. 
We present our IFC model and describe and evaluate our IFC architecture and implementation (CamFlow). This comprises an OS level 
implementation of IFC with support for application management, together with an IFC-enabled middleware. 

Index Terms —Compliance, Security, Audit, Cloud Computing, Information Flow Control, Middleware, PaaS 

- ♦ - 


1 Introduction and Motivation 

MODEL of cloud services is emerging whereby a 
few trusted providers manage the underlying hard¬ 
ware and communications infrastructure—datacenters 
with worldwide replication to achieve high data in¬ 
tegrity and availability at low latency. Many compa¬ 
nies build on this infrastructure to offer higher level 
cloud services, for example Heroku is a PaaS built on 
Amazon's EC2, above which SaaS offerings can be built 
(e.g. the LIFX smart lightbulb cloud service on top of 
the Heroku platform). From the start, protection was 
a paramount concern for the cloud as infrastructure 
is shared between tenants. Strong tenant isolation was 
provided by means of totally separated virtual machines 
(VMs) (l [, Q and more recently, isolated containers have 
been provided that share a common OS kernel [31. 

Increasingly, cloud-hosted applications may need not 
only protection (and isolation) from other applications 
but also have requirements for flexible data sharing, often 
across VM and container boundaries. An example is the 
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UK GCloucQ initiative, a government platform designed 
to encourage small companies to provide cloud-hosted 
applications. These applications need to be composed 
and made to interoperate to support citizens' needs for 
online services. Similarly, the Massachusetts Open Cloud 
[41 is a marketplace (Open Cloud Exchange (OCX)) to 
encourage small business development. Solutions are 
open and one may build on the services of another. The 
aim is to create a catalyst for the economic development 
of business clusters. 

End-users of cloud services still need to be assured 
that their data is protected from leakage to other parties 
by their cloud hosts, due to software bugs or mis¬ 
configurations, also safeguarded to the extent possible 
against insider attacks and external threats. But increas¬ 
ingly, they also need to be able to access their own 
data across applications and to share their data with 
others, according to the policies they specify. Contain¬ 
ment mechanisms, such as VMs and containers, provide 
strong isolation between applications, but do not support 
these sharing requirements. The incorporation of cloud 
services within 'Internet of Things' (IoT) architectures |5] 
is another driver of the requirement for both protection 
and cross-application data sharing, given these IoT ar¬ 
chitectures' strong emphasis on (safe) interaction. For 
example, a patient being monitored at home may store 
sensor-gathered medical data in the cloud and share it 

1. https://www.gov.uk/digital-marketplace 
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with selected carers, medical practitioners, and medical 
research (big-data) repositories, via cloud-hosted and 
mediated services. Once data has left end-users' homes 
for cloud services, they need to be assured that it is only 
accessed as they specify 

Traditional access control tends to be principal/role 
specific, and apply only within the context of a partic¬ 
ular application/service. Controls are applied at policy 
enforcement points, after which there is no subsequent 
control over where data flows. Once data has left the 
direct control of its owner, for example, after being 
shared with others, it is difficult using traditional access 
controls to ensure and demonstrate that it is not leaked. 
If a leak is suspected, it often cannot be established 
whether this is a breach of confidentiality by a person or 
due to buggy or misconfigured cloud service software. 

Encryption offers protection by restricting access to 
intelligible data, even beyond the boundary of one's 
technical control. However, encryption hinders flexible, 
nuanced data sharing, in that key management (dis¬ 
tribution, revocation) is difficult. Further, traceability is 
limited, as being mathematically based there is generally 
no feedback as to when/where decryption occurs; and 
a compromised key or broken encryption scheme at any 
time in the future places data at risk. As such, it is 
important that data flows are managed and audited, 
even if data items are encrypted. 

Although contracts exist between cloud providers and 
tenants, and cloud services are increasingly subject to 
regulation |6j, there is at present no way to establish that 
providers remain in compliance with these agreements 
and requirements. Also, there are often requirements 
that data should pass through certain processes, e.g., 
encryption or anonymisation. There is currently no clear 
mechanism to express such requirements and demon¬ 
strate they have been consistently enforced. 

An approach to maintaining the association of data 
with policy is to use "sticky policies" |7j. Here, owner- 
specified management constraints are attached to en¬ 
crypted data. Decryption is only allowed by parties ac¬ 
cepting the management constraints and able to enforce 
them. This forms the basis for establishing contractual 
relationships between data owners and service providers 
or other applications. However, this approach requires 
trust in a (relatively large amount of) software. Further, 
the enforcement is either at too coarse a granularity or 
prohibitively expensive. This is further explored in j |2.4| 

As an alternative. Information Flow Control (IFC) 
augments traditional access control by offering contin¬ 
uous, system-wide, end-to-end flow control based on 
properties of the data—for example, "medical data may 
only be used for research purposes after going through 
consent checking and anonymisation". IFC allows secu¬ 
rity contexts to be defined system-wide and guarantees 
non-interference between them. This is achieved by tags 
applied to entities (e.g., processes, files, database en¬ 
tries), inseparable from the entities they are associated 
with. Every exchange of data between entities is verified 


against security-context-domain relationships created by 
the tags, thus allowing tight control over any subsequent 
transfers of the data. 

In this paper we present CamFlow (Cambridge Flow 
Control Architecture). We outline CamFlow's IFC model 
and implementation which comprises a new operating 
system (OS) level implementation of IFC as a Linux 
Security Module (LSM), with support for application 
management, together with an IFC-enabled middleware. 
IFC tags are checked on OS system calls and on mes¬ 
sage passing by the middleware, to determine whether 
data flows are permissible. Log records can be made 
efficiently of all attempted flows, whether permitted 
or rejected, and this log provides a possible basis for 
audit, data provenance and compliance checking. By this 
means it can be checked whether application level policy 
has been enforced and whether cloud service provision 
has complied with contractual obligations. 

We argue that incorporating IFC into the underlying 
PaaS-provided OSs, as a small, trusted computing base 
would greatly enhance the trustworthiness of cloud 
services, whether public or private, and hence all their 
hosted services/applications. Our evaluation shows that 
IFC would incur acceptable overhead and our IFC 
model is designed to ensure that application developers 
need not be aware of IFC, although some application 
providers may wish to take explicit advantage of IFC. 
We demonstrate the feasibility of our approach via an 
IFC-enabled framework for web services, see 0 
Contributions: Our main contribution is to demonstrate 
the feasibility of providing IFC as part of cloud software 
infrastructure and showing how IFC can be made to 
work end-to-end, system-wide. In addition to discussing 
the 'big picture', in this paper we also present a new 
kernel implementation of IFC and a new audit function. 
Our approach enables: (1) protection of applications 
from each other (non-interference); (2) flexible, managed 
data sharing across isolation boundaries; (3) prevention 
of data leakage due to bugs/misconfigurations; (4) ex¬ 
tension of access control beyond application boundaries; 
(5) increased transparency, through detailed logs of in¬ 
formation flow decisions. 

0 gives background in protection and IFC, then 0 
presents the essentials of the CamFlow IFC model, with 
examples. 0 and 0 describe our new OS-level im¬ 
plementation of IFC as a LSM and its integration via 
trusted processes with an IFC-enabled middleware, stor¬ 
age services, etc. 0emphasises that audit in IFC systems 
produces logs capable of being processed by 'big-data' 
analytics tools. Audit is central to establishing prove¬ 
nance and for providers to demonstrate compliance with 
contract and regulation. 0 shows how standard web 
services are supported transparently by the CamFlow 
architecture: only a privileged application management 
framework need be aware of IFC and unprivileged 
application instances can run unchanged. In all cases, 
evaluation is included within the section. 0 summarises, 
concludes and suggests future work. 
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2 Background 

We first define the scope of current isolation mech¬ 
anisms, highlighting the need for flexible data shar¬ 
ing at application-level granularity, i.e. where applications 
manage their own security concerns, as well as strong 
isolation between tenants and/or applications. As an 
introduction to IFC we outline the evolution of IFC 
models. Related work on IFC implementation at the OS 
level and within distributed systems is given with the 
relevant sections. We end with a brief comparison of IFC 
with taint tracking (TT) and sticky policies. 

2.1 IFC Models 

In 1976, Denning J8j proposed a Mandatory Access 
Control (MAC) model to track and enforce rules on 
information flow in computer systems. In this model, 
entities are associated with security classes. The flow of 
information from an entity a to an entity b is allowed 
only if the security class of b (denoted b) is equal to or 
higher than a. This allows the no-read up, no-write down 
principle of Bell and LaPadula |9] to be implemented 
to enforce secrecy. By this means a traditional military 
classification public, secret, top secret can be implemented. 
A second security class can be associated with each 
entity to track and enforce integrity (quality of data); 
no read down, no write up, as proposed by Biba 1101. A 
current example might allow input of information from 
a government website in the .gov.uk domain but forbid 
that from "Joe's Blog". Using this model we are able 
to control and monitor information flow to ensure data 
secrecy and integrity. 

In 1997 Myers [111 introduced a Decentralised IFC 
model (DIFC) that has inspired most later work. This 
model was designed to meet the changing needs of 
systems from global, static, hierarchical security levels 
to a more flexible system, able to capture the needs 
of different applications. In this model each entity is 
associated with two labels: a secrecy label and an integrity 
label, to capture respectively the privacy/confidentiality 
of the data and the reliability of a source of data. Each 
label comprises a set of tags, each of which represents 
some security concern. Data is allowed to flow if the 
security label of the sender is a subset of the label of the 
receiver, and conversely for integrity. 

Implementations of a decentralised model akin to 
Myers' include a sensitive embedded system for BMW 
cars 1121 and XBook [13] in a social media context. 
Our own model is described in fj3] When implemented 
from the OS kernel level, applications running under 
IFC enforcement do not need to be trusted for the data 
management policy to be properly enforced |14| . 

2.2 Protection via VMs and Containers 

Isolation of tenants in cloud platforms is through 
hypervisor-supported virtual machines 0. ® or OS- 
provided containers j3j. Flowever, flexible sharing mech¬ 
anisms are also required to manage data exchange be¬ 
tween applications contributing to more complex sys¬ 


tems, or to achieve end-user goals. For example, gov¬ 
ernment applications might access citizens' records for 
various purposes; a user's data from different applica¬ 
tions might together contribute to evidence related to 
health or wellbeing. 

At present, the sharing of information between appli¬ 
cations tends to involve a binary decision (i.e. to share 
or not), as for example in Google pods (containers)]^] 
Whole resources can be shared, but no control over data 
usage between applications is provided. Furthermore, 
there are no means for preventing leakage outside of 
the mechanisms implemented by the individual appli¬ 
cations / services. 

Solutions have been proposed to provide intra¬ 
application sandboxes (down to individual end-users) 
(15| , but such schemes are difficult to scale, require 
changes in application logic, and still do not provide 
control beyond isolation boundaries (i.e. again, loss of 
control once the data is shared). 

IFC has been proposed to guarantee the proper usage 
of data by social network applications |13[. The aim is to 


provide purpose-based disclosure via IFC [16] between 
isolated components, thus guaranteeing that shared data 
can only be used for a well-defined and agreed-upon 
purpose. 

IFC is by no means proposed as a replacement for 
access control, VMs or containers, but rather as a com¬ 
plement to those techniques to provide flexible, managed 
data-sharing. IFC would allow tenants and end-users to 
maintain control (within an IFC-enforcing world) and 
define policy applying to their data consistently and 
beyond isolation and application borders. 

2.3 Taint Tracking (TT) Systems 


Runtime, dynamic TT is similar to IFC but with less func¬ 
tionality. TT systems use one tag type "taint" instead of 
secrecy and integrity tags. Tags propagate with data and 
data flows may be logged. An entity that inputs tagged 
data acquires the data's tag(s). Data flow constraints 
are only enforced at specified sink points, for example, 
when data attempts to leave a mobile phone GZJ- Policy 
is applied at sink points such as preventing private, 
unanonymised or unencrypted data from flowing, or 
strictly controlling to where data may flow. 

An example of TT used for integrity purposes is to 
taint data from untrusted sources, e.g., user input from 
a TCP stream in a web application environment, and 
enforce that it is sanitised before being processed fll8| . 
This simple mechanism prevents injection attacks that 
plague badly designed web applications. An example of 
TT used for confidentiality purposes is to taint sensitive 
information, e.g., a list of contacts in a mobile phone, and 
track it through this closed system 0- Data leaving the 
system (i.e. the phone) is analysed to ensure it does not 
contain sensitive information. Data containing sensitive 
information should only leave to a number of closely 


2. https://doud.google.com/container-engine/docs/pods/ 
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controlled destinations, such as the cloud backup contact 
list. This approach aids the detection of malicious ap¬ 
plications attempting to steal user-sensitive information 
and send it to third parties. Equally, this type of concern 
can be captured through the use of IFC policies. 

One concern with TT systems is that there is a gap in 
time between the occurrence of the issue (e.g. a leak, an 
attack) and when it is detected [19] i.e. problems become 
evident only when the tainted data reaches a sink (en¬ 
forcement point). Depending on the degree of isolation 
between the different parts of the system, and the num¬ 
ber of system components involved, this tainted data 
may have 'contaminated' much of the system. While 
this can be managed in smaller, closed environments, 
it is less appropriate for cloud services in general. IFC 
policies present the clear advantage to prevent problems 
as they occur and to stop their effects propagating to a 
potentially large part of the system. 

Some argue that TT is simpler to use than IFC, and 
incurs lower overhead, but when the enforcement is 
systemic and the granularity identical the overheads 
are similar (compare |17| and the evaluation in >J4] and 
(5 1 . Indeed, the complexity of verifying IFC policy (see 
f 31 is comparable to the cost of propagating taint. For 
both techniques, most of the overhead comes from the 
mechanism for intercepting data exchange. 

2.4 Sticky Policies 

IFC can be seen as a mechanism for enforcing policy; 
the labels associated with entities represent application 
policies. IFC is a simple, low-level mechanism. Sticky 
policy approaches also consider the enforcement of data- 
bound policy, but at a higher-level. 

Casassa-Mont et al. |20| first introduced sticky policies, 
which involves encrypting data along with a list of poli¬ 
cies to be enforced on that data. To obtain the decryption 
key from a Trusted Authority (TA), a party must agree to 
enforce the policies associated with the data. This agree¬ 
ment may be considered as part of forming a contractual 
link between the data owner and the service provider. 
Work has continued in the area [211—[231. Sticky policies, 
typically enforced at the application-level, are generally 
more complex and heavyweight than the simple secrecy 
and integrity constraints of IFC. As such, sticky policies 
tend only to be enforced at particular points, e.g. at 
administrative boundaries. IFC on the other hand, as 


we show in [4.1.2 can be enforced continuously at a 
reasonable cost. In [|3j we discuss how complex policies 
might be built from IFC labels. 

Further, the sticky policy approach builds upon the 
trust established between the data owner, the TAs and 
services that use the data. A non-compliant service could 
be black-listed, but only if and when a breach of agree¬ 
ment is detected and the TA updated. Our IFC approach 
builds only upon the trust between the data owner and 
the cloud provider. Services and applications running on 
top of the cloud provider platform need not be trusted. 
We believe this to be a great improvement to the overall 



Fig. 1: An allowed safe flow and prevented flows, 
trustworthiness of the system. 

3 CamFlow-Model: IFC for the Cloud 


IFC operates to ensure that only permitted flows of 
information can occur, by enforcing data flow policy dy¬ 
namically, end-to-end, within and across applications/ 
services. Entities to which IFC constraints are applied 
can include a MapReduce worker instance [24], a file, a 


process, a database entry [25], etc. In CamFlow, IFC is ap¬ 
plied continuously, typically on every system call for an 
IFC-enabled OS, and on communication mechanisms for 
enforcement across applications/runtime environments. 
IFC policy should therefore be as simple as possible, 
to allow verification, human understanding and to min¬ 
imise runtime overhead. Indeed, there is no need for IFC 
to encapsulate every possible policy; rather, it augments 
other control mechanisms, and can help enforce their 
policies. 

3.1 Tags and Labels 


We define tags that are tokens, each representing some 
security concern over secrecy or integrity. The tag 
bob-private could for example represent Bob's personal 
data. We associate every entity in the system with two 
labels (sets of tags): an entity A has a secrecy label S(A) 
and an integrity label 1(A). The state of these labels is 
the security context of the entity. The power of IFC is that 
it guarantees non-interference between security contexts 

©, ©. 

Example - secrecy: Suppose a patient. Bob is discharged 
from hospital to be medically monitored at home. The 
data streams from his sensors are transferred to a cloud 
service and are to be shared with his medical team at 
the hospital. The data items from his devices are tagged 
with medical, bob in their secrecy labels. 

Example - integrity: The cloud-based home monitoring 
support service needs to be assured that the data it 
receives is from a hospital-issued device. Each sensing 
device is checked and issued with the tag hospital-device 
in its integrity label. 

Fig-0 illustrates information flow constraints being 
applied over both secrecy and integrity dimensions. 

3.2 Decentralised Privileges and Security Contexts 


In decentralised IFC (DIFC) any active entity can create 
new tags. Tag creation is typically carried out by appli¬ 
cation managers when setting up application instances. 
When an active entity creates a new tag either for secrecy 
or integrity, this process is given the corresponding 
privilege to add and remove the tag to its secrecy or 
integrity label respectively. If an active entity A has a 
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Fig. 2: Medical data declassified and endorsed for research purposes. 


privilege to add t to its secrecy label, we denote this 
t £ Pg (A), and to remove t from its secrecy label: 
t £ Pg (A) (and similarly Pf (A) and Pf (A ) are the priv¬ 
ileges for integrity). An active entity may therefore have 
four privilege sets in addition to its security context. 
Application managers will normally set up application 
instances in security contexts, without the privileges to 
change them. An example is given in (J7j 

3.3 Creating a New Entity 


We define A => B as the operation of the entity A 
creating the entity B. An example is creating a process 
in a Unix-style OS by clone. We have the following rules 
for creation: 


if A => B, then 


S(B) := S(A) 
1(B) := 1(A) 


That is, the created entity inherits the security context 
of its creator. These rules force the creating entity to 
explicitly change its security context to that required for 
the entity to be created. We motivate this below in j ]3.4.2 
Note that only labels pass to the created entity; privileges 
have to be passed explicitly. 

3.4 Security 


The purpose of IFC models is to regulate flows between 
entities, and effect label changes and privilege delega¬ 
tion. 


Definition 1. A system is secure in the CamFlow IFC model 
if and only if all allowed messages are safe (Definition [2j, all 
allowed label changes are safe (Definition [3j and all privilege 
delegation is safe (Definitions ^and^j. 

3.4.1 Information Exchange 

IFC prevents data leakage by controlling the exchange 
of information. We follow the classic pattern for IFC- 
guaranteed secrecy (no read up, no write down @) and 
integrity (no read down, no write up 1101). 

Definition 2. A floiv of information A —► B is safe if and 
only if: 

A-¥ B, iff {S(A) C S(B) A 1(B) C 1(A)) 

Example - secrecy enforcement: Consider our exam¬ 
ple of patient monitoring after discharge from hospital. 


where the patient's devices are tagged with medical, bob 
in their secrecy labels. In order for the cloud service to 
be able to receive this data it must also include the tags 
medical, bob in its secrecy label. Therefore an application 
instance accessing Bob's medical data must be labelled as 
such. In : j7]we describe how applications can be designed 
to meet such requirements. 

Example - integrity enforcement: The cloud-based 
home monitoring support service needs to be assured 
that the data it receives is from a hospital-issued de¬ 
vice. To achieve this, the service has an integrity tag 
hospital-issued in its integrity label and will only accept 
data from devices with tags hospital-issued. 

3.4.2 Label Change 

Under the above constraints, information flows are 
restricted to equal or increasing secrecy constraints 
and equal or decreasing integrity constraints. Flowever, 
data may undergo transformations and/or checks that 
change its security properties. For example, moving data 
through an anonymisation engine renders the data less 
sensitive, so less strict secrecy constraints can apply 
to the anonymised output. In the integrity dimension, 
data may go through a validation process on input, 
thus becoming more trustworthy. In CamFlow only the 
process itself is able to change its secrecy and integrity 
labels, which requires the appropriate privileges and 
must be explicitly requested. 

Definition 3. A label change noted A A! is safe if and 
only if for a label X (either S or I) and a tag t: 

X(A') := X(A) U {t} if t £ P+(A) 

OR 

X(A') :=X(A)\{t}iftePf(A) 


Declassifiers and endorsers are the entities with the 
privileges to perform security context transformations. 
Declassifiers change the secrecy properties and endorsers 
change the integrity properties. 

Example - declassification: A medical record system 
is held in a private cloud. Research datasets may be 
created from these records, but only from records where 
the patients have given consent. Also, only anonymised 
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data may leave the private protected environment. We 
assume a health service approved anonymisation proce¬ 
dure. Fig. [3] shows the anonymiser inputting data tagged 
as personal and declassifying the data by outputting data 
with secrecy tag research. 

Example - endorsement: In the same example, the 
Research Database is on a public cloud and may only 
receive research data tagged with consent, anon in its 
integrity label. In the private cloud we see a pro¬ 
cess that selects appropriate records for specific re¬ 
search purposes, checks for patient consent and adds 
the tag consent to the integrity label of its output. The 
anonymiser process can only input data with this tag; it 
anonymises the data and outputs data with the tag anon 
in its integrity label. 

Some previous work |14j, 1281 allows implicit declas¬ 
sification and endorsement. That is, if an active entity has 
the privilege to declassify/endorse and the privilege 
to return to its original state (i.e. for declassification/ 
endorsement over t the entity has privilege t~ and /; + ), 
the declassification/endorsement may occur implicitly 
without the need for the entity to make the label changes. 
We believe that this could in practice lead to unintentional 
data disclosure. Suppose an entity has the privilege to 
declassify top-secret information. The requirement for 
explicit label change makes it unlikely that the entity will 
send such data accidentally to an unintended recipient. 
Our model has stronger constraints that require endorse¬ 
ment and declassification operations to be programmed 
explicitly. 

3.4.3 Privilege delegation 

An entity is only able to delegate a privilege it owns. 
Definition 4. A privilege delegation is safe if and only if 

ic rf(A). 

3.5 Conflict of Interest 

In CamFlow alone among IFC systems, privilege dele¬ 
gation is further restricted by Conflict of Interest (Col) 
(or Separation of Duty (SoD)) enforcement. The receiving 
entity A, must not be put in a situation where it would 
break a Col constraint. By this means, an application 
manager is prevented from creating an application in¬ 
stance with access to conflicting data. 

Definition 5. An entity A does not violate a Col C if and 
only if: 

| (s(A) U 1(A) U p£(A) U Pf(A) U Pf (A) U Pf (A)j n cj < 1 

Example - conflict of interest: A Col might arise when 
data relating to competing companies is available in a 
system. In a hospital context, this might involve the 
results of analysis of the usage and effects of drugs from 
competing pharmaceutical companies. The companies 
might agree to this analysis only if their data is guar¬ 
anteed to be isolated, i.e. not leaked to other companies. 

The hospital may be participating in drug trials and 
want to ensure that information does not leak between 
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Fig. 3: The interactions of the IFC Security Module (LSM) 

and a Trusted Process within an OS. 

trials: suppose a conflict is C = {Pfizer, GSK, Roche,...} 

and some data (e.g. files) are labelled PfizerData[S' = 
{Pfizer}, / = 0] and RocheDatafS' = {Roche}, / = 0]. The 
Col described ensures that it is not possible for a single 
entity (e.g. an application instance) to have access to 
both RocheData and PfizerData either simultaneously or 
sequentially, i.e. enforcing that Roche-owned data and 
Pfizer-owned data are processed in isolation. 

The next sections describe the CamFlow platform that 
enforces the IFC constraints described. 


4 OS Enforcement 

At the heart of the architecture is a minimal kernel mod¬ 
ule dedicated solely to OS-level IFC enforcement. The 
module is trusted to enforce IFC, transparently, across 
all flows between entities within the OS. User space 
processes can directly interact with the kernel module, 
e.g. to delegate privileges (j 3.4[ ) through a pseudo-file 
system, abstracted through a high level API. Fligher level 
considerations and policies can be managed through 
specifically defined Trusted Processes (see j ]4.2[ |. The local 
machine architecture is represented in Fig! 131 

Note that IFC operates alongside and complements 
other security technologies. It is not a cloud security 
panacea; challenges regarding covert and side channels, 
and direct access to hardware by an attacker remain, as 
they do for systems in general. There are approaches 
that can help address these security threats, but many 
are highly disruptive (e.g. synchronisation approaches 
to reducing timing channels) and are infrequently used. 
Other threats may be easier to mitigate and solutions 
may be used when appropriate (e.g. on-disk encryption). 

4.1 CamFlow-LSM 


Our kernel module, CamFlow-LSM, is implemented as a 
Linux Security Module (LSM) 29 ]. Although our work is 
Linux-specific, a similar approach could be used on any 
system providing LSM-like security hooks. Unlike other 
DIFC OS implementations |14|, |28| our kernel patch is 
self-contained, strictly limited to the security module, 
does not modify any existing system calls and follows 
LSM implementation best practice. This allows, among 
other things, LSM stacking |30) , pTJ and coexistence 
with other security modules such as e.g. SELinux |32| 
or AppArmor [|33j and complements their MAC enforce¬ 
ment with decentralised information flow policies. 

We assume that the rest of the kernel can be trusted 
and does not interfere with the IFC enforcement mech¬ 
anism. LSM system hooks have been statically and 
dynamically verified }34|-|36[, and our implementation 
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inherits from LSM the formal assurance of IFC's correct 
placement on the path to any controlled kernel object. 
This is sufficient to guarantee that we control flow and 
record audit on any operation on a controlled kernel 
object. 

Since applications running on SELinux ]32| or Ap- 
pArmor [331 need not be aware of the MAC policy 
being enforced, we see no reason to force applications 
running on an IFC system to be aware of IFC; only those 
performing declassification or endorsement operations 
are necessarily aware. This implementation choice is 
important; cloud providers can incorporate IFC without 
requiring changes in the software deployed by ten¬ 
ants. Alternatively, applications that wish to manage 
their own IFC constraints can declare policy through a 
pseudo-filesystem (as is typical for LSMs) abstracted by 
a user space library and enforced transparently by the 
IFC mechanism. 

The LSM framework calls security hooks when access 
to a kernel object is attempted. Security metadata can be 
associated with kernel objects and is used by the LSM 
module to make access decisions. Tags and privileges 
are represented by 64-bit opaque nonces associated with 
kernel objects such as processes, inodes, files, shared 
memory objects, messages etc. On interaction between 
kernel objects, CamFlow-LSM security hooks are called 
to enforce data-flow policy ('[3.4.1 } or propagate tags on 
entity creation (( [3.3} as appropriate. 

Only active entities (processes) have mutable labels 
and privileges, all other (passive) entities have im¬ 
mutable labels and no privileges. 

Privileges are allocated by the kernel and owned by 
the creating process (any process can create tags and the 
associated privileges in a decentralised fashion). Privi¬ 
leges can be passed to other processes, users or groups, 
CamFlow-LSM verifying that constraints on privilege 
delegation (( [3.4.3 1 and conflict of interest (( [3.5} are not 
violated. A process can add or remove a tag from its 
label if it owns the appropriate privilege (following IFC 
constraints described in ( 3.4.2} , if the current user owns 
the privilege or if the current group owns the privilege. 
Flow tags are shared and managed must be considered 
with care when designing an application and the system 
must be administered accordingly. 

4.1.1 Checkpointing and Restoration 


Checkpointing a process involves halting its execution, 
allowing it to be restarted at a later stage, and enabling 
migration, e.g. [ 371. LSM state is normally saved and 
restored by the checkpointing system, e.g. |38) , and 
our module further exports an API to more efficiently 
serialise and restore security context. 

Furthermore, self-checkpointing and restoring the pre¬ 
vious state of a process, has been demonstrated [ 391 to be 
a beneficial feature for IFC systems. This is particularly 
useful for processes serving requests. In such a scenario 
the state of the process is saved after initialisation. When 
a request is received, the serving process sets itself up 


sys_clone 
sys_read 
sys_write 
sys_pipe 
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dyn. label IFC LSM Native 

Fig. 4: Overhead introduced into the OS by CamFlow 
LSM (x-axis time in ps). 

in the security context appropriate to serve the request. 
After the request is served (or a series of requests if the 
system is session-based as described in (J7|, the process 
restores its memory state and security context to what 
they were immediately after initialisation. This improves 
performance and prevents data leaks between security 
contexts. 



4.1.2 OS Evaluation 


We tested the CamFlow-LSM module on Linux Kernel 
version 3.17.8 (01/2015) from the Fedora distribution^] 
The tests are run on an Intel 2.2Ghz i7 CPU and 6GiB 
RAM machine. 


Measurements are done using the Linux tool ftrace [401 
to provide a microbenchmark. Two processes read from 
and write to a pipe respectively. Each has 20 tags in its 
security label, substantially more than we have seen a 
need for in current use cases. We measure the overhead 
induced by: creating a new process (sys_clone), creating 
a new pipe (sys_pipe), writing to the pipe (sys_write) and 
reading from the pipe (sys_read). The results are given 
in Fig. [4] 

We can distinguish two types of induced overhead: 
verifying an IFC constraint (sys_read, sys_write) and allo¬ 
cating labels (sys_clone, sys_pipe). The sys_clone overhead 
is roughly twice that of sys_pipe as memory is allocated 
dynamically for the active entity's labels and privileges. 
Recall that passive entities have no privileges. Overhead 
measurements for other system calls / data structures are 
essentially identical as they rely on the same underlying 
enforcement mechanism, and are not included. 

The CamFlow-LSM overhead is a few percent, see 
Fig. [4] We provide a build option that further improves 
performance by declaring labels and privileges with a 
fixed maximum size (by default, label size can increase 
dynamically to meet application requirements). This re¬ 
duces the overhead of the system calls that create new 
entities (the dynamic label component in Fig. [4j. This is 
an acceptable trade-off as in practical scenarios, labels 
rarely exceed more than five tags. Flowever, for most 
applications, the overhead is imperceptible and lost in 
system noise; it is hard to measure without using kernel 


3. It is not feasible to provide a comparison with the Laminar 
implementation |28| , that is closest in technical terms to our work, 
as the implementation available https://github.com/ut-osa/laminar 
is for an obsolete kernel version 2.6.22 (U7/20U7). 















tools, as the variation between two executions may be 
greater than the overhead. 


4.2 Trusted Processes 


The CamFlow-LSM is trusted to enforce IFC at the kernel 
level. Its functionality is minimal; strictly confined to 
the enforcement of IFC policies as described in >[3] This 
guarantees easier maintainability and a system that is 
agnostic to higher level application requirements, thus 
minimising the constraints imposed on user-space ap¬ 
plication design. 


We introduce the concept of a trusted process, that al¬ 
lows application/platform-specific concerns to be man¬ 
aged in user space by bypassing some LSM-enforced IFC 
constraints. For example, a trusted process might serve 
as a proxy for external connections, as in the Trusted IFC 
Gateway in the example in (j7| setting up and managing 
application components' labels. Trusted processes are 
used to interact with persistent storage (see (5.31, for 
checkpointing and restoring processes (see (4.1.11 and 
for managing inter-process and external communication 
(see (j5). 

Fig. [5] shows OS instances running the CamFlow- 
LSM hosting a number of application processes, that 
may be grouped in containers. Each OS instance has 
a single trusted process (Security Context Manager) to 
manage its hosted processes' IFC labels and privileges. In 
addition, each process has an associated trusted middle¬ 
ware process to handle inter-process and inter-machine 
communication. Such communication may be within or 
between containers, OSs or clouds. 


In this example, S represents a particular set of secrecy 
tags, and I a particular set of integrity tags, both of 
which remain the same throughout. The application pro¬ 
cesses and other OS objects, such as pipes and files, are 
labelled [S,I]. The process labelled [0,/] writes 'public' 
data to a pipe, which is read by a process labelled [, S , I], 
assuming all the I tags match correctly. Similarly, two 
processes are shown writing to and reading from a file. 


The Security Context Manager maps between the 
kernel-level representation of tags (as 64-bit integers) 
and the representation of tags in user space. Within a 
cloud or other trusted environment, tags may be simple 
strings. When tags need to cross domain boundaries, 
e.g., when cloud services form part of a wider archi¬ 
tecture, as in IoT, tags may need to be protected by 
cryptographic means (see (5.11. 

Trusted processes are either set up through static 
configuration, read at boot time by the CamFlow-LSM 
module, or created at runtime by another trusted pro¬ 
cess. Trusted processes must either be managed by a 
trusted party (in our current approach the underlying in¬ 
frastructure provider) and /or the code must be auditable 
and a means to verify the current version running on the 
platform must be provided (see (4.3 1 . 


4.3 Leveraging Hardware Roots of Trust 

Incorporating IFC into cloud-provider OSs would en¬ 
hance the trustworthiness of the platform. However, 
IFC only guarantees protection above the technical layer 
in which it is enforced. Recent hardware and software 
developments make it possible to attest that the software 
layers on which our platform runs have been audited. 

The Trusted Platform Module (TPM) [41], as used for 
remote attestation ]42) , is one such hardware mecha¬ 
nism. TPM is used to generate a nearly unforgeable 
hash representing the state of the hardware and soft¬ 
ware of a given platform, that can be remotely verified. 
Therefore, a company could audit the implementation 
of our IFC enforcement mechanism and ensure that our 
kernel security module, messaging middleware and the 
configuration they provide are indeed running on the 
platform. Any difference between the expected state of 
the software stack and the platform could be considered 
a breach of trust; such considerations can easily be 
embedded in the contractual obligations of the cloud 
provider. 


TPM and remote attestation for cloud computing 143 [ 


are reaching maturity, with IBM rolling out an open 
source, scalable trusted platform based on virtual TPMs 
|44| . Indeed, Berger et al. |44| describe a mechanism 
allowing the TPM and remote attestation to be provided 
for virtual machine offerings and container-based solu¬ 
tions, covering the whole range of contemporary cloud 
offerings. Furthermore, the approach not only allows the 
state of the software stack to be verified at boot time, but 
also during execution, and can thus prevent run-time 
modification of the system configuration. 


5 Cross-Machine Enforcement 


CamFlow-LSM operates to protect flows within the OS. 
However, it is also important that flows are protected 
across OS instances. 

Generally, in order to guarantee flow constraints, only 
processes P such that S(P) = 0 and I{P) = 0 , i.e. not 
subject to IFC constraints, are allowed to directly connect 
to or receive messages from connections on remote OS 
(e.g. through a socket). In order to connect to another 
machine, a process must either: 1) be able to declassify 
to change its security context to S(P) = 0 and I(P) = 0 ; 
2) communicate through an intermediate trusted process. 

As such, CamFlow contains an IFC-enabled, fully- 
featured messaging middleware (CamFlow-MW) to both 
facilitate communication and guarantee enforcement 
across machines. For want of space, we only consider 
the middleware concepts relevant to IFC; details on the 
general middleware (as it was prior to IFC/CamFlow 
integration) can be found in |45) . In short, the middle¬ 
ware supports strongly-typed messages; a range of in¬ 
teraction paradigms, including request-reply, broadcast, 
and streams; flexible resource discovery; and security 
mechanisms including access controls and encrypted 
communication. A particular feature is its support for 
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Fig. 5: CamFlow Architecture: Labelled OS objects, trusted processes and communication middleware. 


dynamic reconfiguration based on event-driven policy. 
This simplifies both application development and de¬ 
ployment, as concerns can be abstracted and tailored 
to the particular environment, rather than embedded 
within application code. 

The role of the middleware is to move towards contin¬ 
uous, end-to-end data flow management, such that IFC 
can be enforced across applications/machines (kernels). 
There is work on IFC enforcement across machines; 
however, these impose specific requirements, such as 
design-time considerations |46fl , a particular language/ 
runtime |47j , or constraints on system architecture/ 
implementation |48] , In contrast, we integrate IFC func¬ 
tionality into the general, fully featured distributed sys¬ 
tems middleware mentioned above (see |49|), to provide 
flexibility and be more generally applicable. We delib¬ 
erately avoid imposing a structure on system design, 
instead integrating IFC functionality into the sort of com¬ 
munications infrastructure common to current enterprise 
and cloud systems. 

5.1 Remote Interactions 


CamFlow-MW operates by associating a trusted process 
(see ; |4.2j with an entity that seeks to communicate via 
the messaging system. Its fit within the broader architec¬ 
ture is depicted in Fig. [5] The process is responsible for 
handling the communication of messages, and enforces 
IFC based on the current runtime labels of the entity on 
whose behalf it operates. 

It follows that for IFC to be enforced across machines, 
tags require system-wide management, i.e. throughout 
the cloud service. In |50| we proposed that the widely 
used and available X.509 certificates could be used. The 
approach relies on public key certificates and attribute 
certificates 1511, to respectively identify the application 
associated with the CamFlow-MW instance and the tags 
associated with this application. 

As part of establishing a connection, CamFlow-MW 
ensures that each entity authorises communication with 
the other, according to a local access control policy. De¬ 
cisions are based on component metadata, the relevant 
authentication aspects secured through PKI (certificates). 
Similarly, IFC policy must also be verified to ensure data 
flows are authorised according to IFC policy. Attribute 
certificates provide cryptographic means to determine 
and verify the tags associated with the remote entity 


(see ©), on which policy can be enforced. If tags do 
not accord, the connection will not be established. 

We also see potential for remote attestation, based on 
hardware integrity measures (see { 4.3I, to be integrated 
into this authorisation phase, to ensure the remote ma¬ 
chine operates a reliable IFC enforcement regime. 


5.2 Message-Level Enforcement 

CamFlow messages are strongly typed, where a mes¬ 
sage type is defined by a schema describing its set of 
attributes. For an instance of a message, an attribute 
consists of a name, type and value. The support for 
IFC within messages is fine-grained, in that individual 
attributes within messages can also be labelled. These 
attribute labels introduce additional IFC constraints over 
and above those already applying to the entity, i.e. as 
recognised by the kernel-LSM, and validated on connec¬ 
tion establishment. 

Labels can be defined within message type schema, 
which sets the attributes' IFC labels for all message 
instances of the type. These labels cannot be changed by 
entities dealing in such messages, and the entities must 
hold the requisite labels to interact with the attributes. 
Otherwise, the entity producing/publishing a message 
can set the security labels for the attributes (for those not 
predefined), if the entity holds the associated privileges. 
Enforcement occurs as follows: 

Receiving: If the receiving entity's labels do not agree 
with those of an attribute value, the attribute value (and 
any sub-attributes) are removed from (made null in) the 
message. This is enforced on message receipt, before it 
is delivered to the entity. 

Sending: An entity cannot send values for attributes 
where its labels do not agree with those of the attribute. 
This is enforced when an entity attempts to send a 
message, ensuring values for any attributes violating this 
policy are removed, before message propagation. 

Enforcement is automatic, meaning that applications 
using the messaging system can be subject to IFC en¬ 
forcement completely transparently (i.e. without their di¬ 
rect involvement); though again, there is the interface for 
the application to actively manage IFC where required. 
In addition, the general reconfiguration capabilities of 
the middleware enable connections between components 
to be defined and managed at runtime, providing an¬ 
other mechanism for controlling communication |45). 
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Fig. 6: IFC overhead of CamFlow-MW for a workload 

transmission of 5000 messages (x-axis in ms) 

5.3 Integrating With Persistent Storage 



One technique to provide IFC with persistent data stores, 
is to store the tags alongside the data, and a trusted 
software component ensures that when information is 
read from the store, the corresponding labels are applied. 
In Flume (14) , a trusted process provides the interface 
between untrusted applications and persistent storage. 
More recent work has seen the emergence of databases 
that natively understand IFC concepts and can enforce 
IFC policies [251. 

We see much promise in having the middleware me¬ 
diate between persistence systems and the kernel, to 
ensure consistent IFC application. 

5.4 Evaluation 

As shown in Fig. |6j the results indicate that IFC en¬ 
forcement introduces an overhead of ~13% in perfor¬ 
mance time compared to the standard, non IFC-enabled 
middleware (see (49) for details). Note that these results 
were measured in the context of a particular workload, 
deliberately designed to highlight the impact of IFC 
enforcement. It follows that the overheads associated 
with real-world usage are most likely less onerous. 

6 Audit: Data-Centric Logs 

IFC, in addition to providing strong assurances that pol¬ 
icy is being enforced, can also provide a data-centric log 

(52) detailing the information flows within and between 
system components. In addition to enforcing IFC via our 
LSM module we log the data flows of labelled processes, 
policy decisions, privileges and IFC security context 
manipulations. Equivalent inter-machine operations via 
the middleware are also recorded. 

Cloud logging systems are generally based on legacy 
logging systems (OS, web-server, database etc.) that 
either fail to capture the needed information, or are 
extremely complicated to interpret in a useful manner 

(53) . More importantly, such logs tend to be relevant only 
to the particular service or component, which makes it 
difficult, if not impossible, to audit across a range of 
applications, clouds, etc. 

IFC logs, as provided by our platform, allow us to cap¬ 
ture information on application-level data flows, both at¬ 
tempted and permitted, allowing the correct expression 
and implementation of data flow policy to be checked. 
This provides transparency allows for meaningful audit, 
in terms of investigating the circumstances in which data 
leakage occurs, and provides evidence of compliance, 
e.g. with legal obligations (54). 


Fig. 7: Simplified audit graph from IFC OS execution (we 
omit metadata for readability). The path to disclosure is 

shown in blue/pale. 

6.1 Analysing Paths to Disclosure 

To assist in interpreting log information, we build a di¬ 
rected graph corresponding to the allowed flows during 
the execution of our system, as shown in Fig. [7] The 
flows defined in our IFC model (see jj3), namely data 
flow, creation flow, security context change and privi¬ 
lege delegation, correspond to the edges of the directed 
graph. Entities (such as processes, files, messages etc.) 
are represented as nodes in the graph. 

In addition to information necessary to build the graph 
(as shown in the figure), additional metadata is collected 
for forensic purposes, which is context/entity/event- 
dependent. These audit entries, provided by the LSM 
and middleware, can be exploited by a dedicated service 
implemented in user space, connected to the kernel 
collection mechanism via relayfs [551. This service feeds, 
for example, a graph visualisation tool such as Cytoscape 
156 ( or a graph database such as Neo4J0 

Such a directed graph helps one identify data leaks. 
For example, a tenant might discover that some sensi¬ 
tive medical data leaked into a data store where only 
anonymised research data were supposed to be stored. 
IFC is enforced in line with the policy encapsulated in 
labels; thus data may leak if such policy is improperly ex¬ 
pressed and/or declassification/endorsement processes 
are not correctly implemented (e.g. if the anonymisation 
process in Fig. [2] allows re-identification). 

Suppose that an information leak is suspected be¬ 
tween different security contexts Li[S,I] and i 2 [S",/']. 
Determining whether such a leak can occur is equiv¬ 
alent to discovering whether there is a path in the 
graph between the two contexts. If the leak occurred, 
there must be a path between some entity E, such that 
S(Ei) = S A /(£)) = I and another entity f 7 ) such that 
S(Fi) = S' A I{Fi) = /'. 

The existence of such a path demonstrates that a leak 
is possible. To investigate whether a leak occurred, it is 
essential to consider the event ID associated with the 
edges comprising the path. We denote by e*, the last in¬ 
coming edge to the entity under investigation with labels 
[S', I']; only edges such that e < e, should be considered. 
When applied to all nodes along a path, this rule ensures 
strictly monotonically increasing timestamps from the 
first node to the last. Fig.[7]shows in blue/pale a possible 
data disclosure path, from file F 2 , from a very simple 


4. http://neo4j.com/ 
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audit graph. We know from the event IDs eo and e± that 
the data disclosure did not occur through file F\ and 
process P 3 , but through P\ 's declassification. 

6.2 Demonstrating Compliance 

Compliance with certain requirements can be demon¬ 
strated through queries over the graph. We assume the 
audit data is stored in a graph database that we can 
query. For example, the following plain English policy: 
"European personal data sent to the US must be anonymised" 
J57) , is equivalent to writing a query that verifies that 
there is no path between EU- and US-labelled data 
without an anonymisation process. 

The policy "Medical data stored in database X must have 
received proper consent and be anonymised” |54) can be 
expressed as a query verifying that there is no path be¬ 
tween data labelled as medical and the database, without 
consent-checking and anonymiser processes. In addition, 
an investigator may want to know which anonymisation 
algorithm has been run, which data has been used to 
generate the anonymised records etc. Our audit graph 
assists in answering such questions. 

Note that IFC only applies guarantees with respect 
to flows. Demonstrating the overall effectiveness of the 
management regime, e.g. the quality and suitability of 
the anonymisation algorithm, is out-of-scope for a flow- 
based enforcement mechanism. 

6.3 Audit as ‘Big Data’ 

We are potentially generating a vast amount of data in 
our IFC logs. However, unlike standard system logs that 
are complex to analyse, our logs generate graphs that 
are ideal for analysis by "big data" tools that have been 
developed for this purpose [581. 

Since the amount of data is potentially huge, the 
amount of data being logged can be fine-tuned to meet 
the requirements of the platform/tenant; e.g. by reduc¬ 
ing the amount of metadata being stored, by logging 
only security context changing operations, by logging 
only information corresponding to some target security 
context, keeping operations on unlabelled entities out¬ 
side of the log etc. The decision on what needs to be 
logged then becomes a tradeoff between utility and the 
volume (cost) of log generated, which can be decided in 
order to correspond to legal or contractual requirements 
(for example, a regulated sector may need to have a 
fine-grained log to satisfy data forensic requirements). 
Indeed, as such an approach is new to the cloud, such 
considerations will be refined by experience, with best 
practices developing over time. 

6.4 Audit Access 

Logs can contain sensitive information and access to 
them should be controlled. This represents an area of 
our ongoing work. Traditional access controls clearly 
play a role; however, secrecy tags could also be lever¬ 
aged. For example, an auditor, before being granted 
access to audit logs, could be forced to demonstrate 


ownership of the corresponding secrecy IFC tags (for 
example through cryptographic means as in The 

auditor may be granted access to a log entry only if 

S(origin) U S(destination) C S(auditor). 

7 Example: Support for Web Services 


One of the most common uses of PaaS is to host web ap¬ 
plications. In this section we present the implementation 
of such a solution built on the infrastructure described in 
(J1J in order to evaluate and demonstrate the feasibility of 
our proposed approach. This is illustrated in Fig. [8] We 
run standard and unmodified Ruby web applications. 

Interaction with end-users is achieved through a 
"gateway" between the IFC and non-IFC worlds. Simi¬ 
larly, interaction with cloud services (such as data stores) 
is also achieved through our messaging middleware as 
discussed in f|5] The requirement for this gateway can 
be removed if a trustworthy IFC implementation can be 
provided at the client side, consistent with the cloud im¬ 
plementation with respect to tag naming, enforcement, 
etc. Tag naming in general, system-wide, is an issue 
beyond the scope of this paper, see further fj8] In our 
proof of concept implementation the gateway is a simple 
Apache server running a custom-built module. 

The role of the gateway is to authenticate the end-user 
when a session is created, and to associate this session 
with an application instance running within the security 
context corresponding to the user. Recall that a security 
context comprises the S and I labels. Any further re¬ 
quests to the gateway in that session are routed to the 
corresponding application instance. Once an instance no 
longer has an associated session it can be recycled using 
self-checkpointing, as described in [ 4.1.1| 

Several application types are running over our cloud- 
based, web services platform. For example, in a medical 
context these might be medical record editing, pharmacy 
ordering, social services etc. A single, shared, identity 
service for the end-user is part of the cloud provider 
offering (in our proof of concept implementation we 
used OAuth [591). 

The GP authenticates, is authorised as treating doc¬ 
tor for Alice and selects the 'identity' that corresponds 
to Alice. A new session is created server-side by the 
gateway, with the requested application instance run¬ 
ning in the corresponding security context, with S = 
[medical, alice], / = [0], When the GP wants to access 
applications on behalf of a new patient, he needs to close 
Alice's session, authorise as treating doctor for Bob and 
open a new session for Bob. 

The control described above is not achieved by the 
application, but by the platform itself and can be con¬ 
trolled by the end-user, subject to access control. That 
is, a medical application used on Alice's behalf runs in 
a security context in which data cannot flow to that of 
another patient. Furthermore, applications running on 
behalf of a given user can share the data of that user 
without the risk of seeing a buggy application leaking 
data between end-users. 
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Fig. 8: PaaS Architecture on top of IFC-OS 


As described in fj4j we assume the middleware and 
the OS enforcement are provided as a service by the 
underlying platform. A tenant wanting to use the third- 
party, web-service offering, once his trust in the under¬ 
lying platform is established, needs only to audit the 
gateway; again, the underlying infrastructure provider 
could either provide such a gateway or audit it. The 
rest of the software stack of the third-party, web-service 
provider is bound by the IFC enforcement mechanism 
and therefore need not be trusted. 

8 Conclusion & Future Work 

IFC allows data flows to be controlled continuously 
throughout a system, by providing an information¬ 
centric MAC scheme that continuously ensures non¬ 
interference between security contexts. This paper pre¬ 
sented the CamFlow platform, that demonstrates the 
potential of cloud-deployed IFC as supporting: (1) pro¬ 
tection of applications from each other; (2) flexible data 
sharing across isolation boundaries; (3) prevention of 
data leakage due to bugs/misconfigurations; (4) exten¬ 
sion of access control beyond application boundaries; (5) 
data flow transparency. 

Specifically, we detailed a new kernel implementation 
of IFC as an LSM, demonstrating low overhead even 
for worst-case scenarios, where processes continuously 
make read/write system calls. We also described the 
integration of a messaging middleware to enforcing IFC 
across machines. This combination makes it possible to 
provide whole-system IFC to PaaS cloud services, and 
therefore also SaaS. Our approach and implementation 
were designed so that applications can run unchanged 
over IFC, thus making cloud adoption feasible. 

We also indicated how the data-centric logs based on 
IFC enforcement could provide the means to audit an 
IFC-enabled system, whereby a log can be processed as 
a directed graph to investigate leaks and attacks and 
show compliance with data management requirements. 
Though this represents our initial work in the area, there 
appears much promise. 

In light of the above, we believe that IFC has great 
potential as a security mechanism for the cloud whereby 
trust in a few major cloud providers, deploying IFC, 
can be built on to provide a demonstrably trustworthy 
computing environment. 


CamFlow was developed with cloud deployment in 
mind. Our future work will investigate the challenges of 
a broader distributed context. 

It is already feasible to extend CamFlow to sup¬ 
port mobile environments. Android supports the full 
SELinux enforcement^] and an Android-LSM integration 
has been demonstrated |60[ . But when dealing with 
multiple cloud services, particularly as they become part 
of a wider distributed architecture such as in the IoT, 
a trustworthy, system-wide deployment of IFC can no 
longer be assumed. Much work remains on establishing 
trust in (and the trustworthiness of) the IFC enforcement 
mechanism within end-users' devices. Outside a cloud 
context, all parties' trust in a common third party's en¬ 
forcement of IFC constraints, cannot be assumed (unlike 
the cloud provider for cloud services). We intend to 
explore leveraging hardware roots of trust and remote 
attestation to ensure the integrity and trustworthiness of 
IFC enforcement mechanisms. 

Another area of investigation concerns the represen¬ 
tation of tags across administrative domains. In a cloud 
context, a federated approach can be envisaged where 
a common understanding of tags could be negotiated 
across multiple domains. For example, in |[54j we dis¬ 
cussed initial thoughts on managing data according to 
specific obligation regimes. Flowever, as the number 
of administrative domains increases, both a global tag 
naming scheme and mechanisms for ad-hoc negotiation 
become necessary. 

Related is the sensitivity of the tags themselves. 
Knowledge of the meaning of a tag can indicate that 
the associated entity contains or deals with certain infor¬ 
mation. If a relationship can be established between an 
entity and the information owner, this may disclose pri¬ 
vate information about the information owner. This may 
lead to work on a need-to-know negotiation mechanism 
to establish a secure channel between hosts, especially 
in a wide-scale distributed system. A promising mech¬ 
anism is private set intersection to determine tags' subset 
relationship ( ; ]3.4.1 } . 

Another challenge concerns extending audit to dis¬ 
tributed architectures, both in terms of resource man¬ 
agement and regulating access to log data. 

5. https://source.android.com/devices/tech/security/selinux 
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